{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "0b9ffbb2-8d33-eb0c-5a42-0733bdc3b74b",
    "_uuid": "e14f65723f5ac826104e861c45e58525740d8b2a"
   },
   "source": [
    "# Ames House数据集上的回归分析\n",
    "\n",
    "Ames房价预测是Kaggle平台上的一个竞赛任务，需要根据房屋的特征来预测亚美尼亚州洛瓦市（Ames，Lowa）的房价。其中房屋的特征x共有79维，响应值y为每个房屋的销售价格（SalePrice）。\n",
    "\n",
    "Kaggle官网上的任务说明请见：\n",
    "https://www.kaggle.com/c/house-prices-advanced-regression-techniques\n",
    "\n",
    "由于房屋的属性较多，且房屋属性类型各异，所以先对原数据集进行特征编码。特征工程过程请见FE_AmesHouse.ipynb\n",
    "\n",
    "\n",
    "本代码直接在编码后的数据集上进行线性回归及其各种正则算法进行验证。\n",
    "\n",
    "Kaggle平台上kernel区有各个竞赛参数者分享的代码，大家可以学习。\n",
    "下面的代码也参考了很多不同选手分享的代码，在此一并致谢。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "_cell_guid": "2ba3154a-c2aa-3158-1984-63ad2c0c786a",
    "_uuid": "5eb696b95780825e94ddb49787f9fa339fc3833b",
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 导入必要的工具包\n",
    "# 数据读取及基本处理\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "#模型\n",
    "from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV\n",
    "\n",
    "#模型评估\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "#可视化\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "_cell_guid": "21fa35be-878b-b4f2-ef6e-68dc070b8bfa",
    "_uuid": "73aee228226be55c0c8a6e4fcbf818c56cd94926"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "train : (1456, 344)\n",
      "test : (1459, 344)\n"
     ]
    }
   ],
   "source": [
    "# 读入数据\n",
    "train = pd.read_csv(\"AmesHouse_FE_train.csv\")\n",
    "print(\"train : \" + str(train.shape))\n",
    "\n",
    "test = pd.read_csv(\"AmesHouse_FE_test.csv\")\n",
    "print(\"test : \" + str(test.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "a1ae46d1-8787-67a3-2f69-6497c97320eb",
    "_uuid": "16fa6600f9877c9607060f9e50de6eaa9bc1c766"
   },
   "source": [
    "**准备训练数据**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "_cell_guid": "3685f20a-c332-7ab0-cced-03d639d28833",
    "_uuid": "39e8e6e7fa4b7f6e628be0bd6aa3b859cd624d2c",
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_train = train[\"SalePrice\"]\n",
    "X_train = train.drop(['SalePrice'], axis = 1)\n",
    "\n",
    "test_Id = test['Id']\n",
    "X_test = test.drop(['Id'], axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "_cell_guid": "1279ec22-61c2-4d5a-b8da-9dcadbda564d",
    "_uuid": "761f87586c85dde9e74331557080d2278b7b4044",
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 数据标准化\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# 初始化对目标值的标准化器\n",
    "# 对y标准化不是必须，但对其进行标准化可以使得不同问题w的取值范围相对相同\n",
    "\n",
    "#自己实现试试...,这些参数需要保留，对测试集预测完后还需要对其进行反变换\n",
    "mean_y = y_train.mean()\n",
    "std_y = y_train.std()\n",
    "y_train = (y_train - mean_y)/std_y\n",
    "\n",
    "#ss_y = StandardScaler()\n",
    "#y_train = ss_y.fit_transform(y_train.values.reshape(-1, 1))\n",
    "#y_test = ss_y.transform(y_test.reshape(-1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "9fe1c63e-3803-feac-362c-631399fdb8ec",
    "_uuid": "431749cca6d8fbbc2d1d3732baeb6d63590c6e2e"
   },
   "source": [
    "**1* Linear Regression without regularization**\n",
    "最小二乘线性回归\n",
    "最小二乘没有超参数需要调优，直接用全体训练数据训练模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "_cell_guid": "101cf15a-9006-ac9a-8abe-756371fd8b1a",
    "_uuid": "38562579d6306877a067189f25ce15ececbe99d2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('RMSE on Training set :', 0.24853506074754278)\n"
     ]
    }
   ],
   "source": [
    "# Linear Regression\n",
    "# 1. 生成学习器实例\n",
    "lr = LinearRegression()\n",
    "\n",
    "#2. 在训练集上训练学习器\n",
    "lr.fit(X_train, y_train)\n",
    "\n",
    "#3.训练上测试，得到训练误差，实际任务中这一步不需要\n",
    "# Look at predictions on training and validation set\n",
    "y_train_pred = lr.predict(X_train)\n",
    "#y_test_pred = lr.predict(X_test)\n",
    "\n",
    "rmse_train = np.sqrt(mean_squared_error(y_train,y_train_pred))\n",
    "#rmse_test = np.sqrt(mean_squared_error(y_test,y_test_pred))\n",
    "\n",
    "print(\"RMSE on Training set :\", rmse_train)\n",
    "#print(\"RMSE on Test set :\", rmse_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('RMSE on Training set :', 0.25647564703430659)\n"
     ]
    }
   ],
   "source": [
    "# 线性模型，随机梯度下降优化模型参数\n",
    "from sklearn.linear_model import SGDRegressor\n",
    "\n",
    "# 使用默认配置初始化线\n",
    "# 1.生成学习器实例 \n",
    "sgdr = SGDRegressor(max_iter = 5000)\n",
    "\n",
    "# 2. 用训练数据训练模型，得到模型参数\n",
    "sgdr.fit(X_train, y_train)\n",
    "\n",
    "# 3. 预测\n",
    "y_train_pred = sgdr.predict(X_train)\n",
    "#y_test_pred = sgdr.predict(X_test)\n",
    "\n",
    "rmse_train = np.sqrt(mean_squared_error(y_train,y_train_pred))\n",
    "#rmse_test = np.sqrt(mean_squared_error(y_test,y_test_pred))\n",
    "\n",
    "print(\"RMSE on Training set :\", rmse_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "该数据集样本数较少，不适合用随机梯度下降方法求解。在训练集上的效果比最小二乘解析求解效果稍差"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "a70803d8-638c-bdfa-6897-aed8ce78f475",
    "_uuid": "2938aee78a9f61d6ca59345d6fa0793e2422f34c"
   },
   "source": [
    "**2* Linear Regression with Ridge regularization (L2 penalty)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Best alpha :', 10.0)\n",
      "('cv of rmse :', 0.31897021511259349)\n",
      "('RMSE on Training set :', 0.26825283183497006)\n"
     ]
    }
   ],
   "source": [
    "#RidgeCV缺省的score是mean squared errors \n",
    "# 1. 生成学习器实例\n",
    "# RidgeCV(alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, store_cv_values=False)\n",
    "ridge = RidgeCV(alphas = [0.01, 0.1, 1, 10, 100, 1000],store_cv_values=True )\n",
    "\n",
    "# 2. 用训练数据度模型进行训练\n",
    "# RidgeCV采用的是广义交叉验证（Generalized Cross-Validation），留一交叉验证（N-折交叉验证）的一种有效实现方式\n",
    "ridge.fit(X_train, y_train)\n",
    "\n",
    "#通过交叉验证得到的最佳超参数alpha\n",
    "alpha = ridge.alpha_\n",
    "print(\"Best alpha :\", alpha)\n",
    "\n",
    "# 交叉验证估计的测试误差\n",
    "mse_cv = np.mean(ridge.cv_values_, axis = 0)\n",
    "rmse_cv = np.sqrt(mse_cv)\n",
    "print(\"cv of rmse :\", min(rmse_cv))\n",
    "\n",
    "#训练上测试，训练误差，实际任务中这一步不需要\n",
    "y_train_rdg = ridge.predict(X_train)\n",
    "rmse_train = np.sqrt(mean_squared_error(y_train,y_train_rdg))\n",
    "print(\"RMSE on Training set :\", rmse_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "_cell_guid": "1fb3f0d6-a070-9ce5-7cc3-01a7e26faa4f",
    "_uuid": "659084128a0ef75a5936912a5f64134774b29664"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Try again for more precision with alphas centered around 10.0\n",
      "('Best alpha :', 12.5)\n",
      "('cv of rmse :', 0.31891891935221395)\n",
      "('RMSE on Training set :', 0.27004042322630895)\n",
      "Ridge picked 297 features and eliminated the other 46 features\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XmcXFWd/vHPI+uwyQwJIpAQUcAR\nfmwGkFFAwRVHQEZQXAAXUAcU3FBxQwVHcFCZn6PCIAojQpBFkVVEwUEFJoSdoKxKJELYA8gSeOaP\nexoqbXX6du2pPO/Xq15dde+5t7510zl96yzfI9tERMTwek6/A4iIiO5KRR8RMeRS0UdEDLlU9BER\nQy4VfUTEkEtFHxEx5FLRxzMkbSDpSknzJX1Y0t9J+pmkByX9WNI7JP28xnkOlnRsL2JeRAxTJT0s\naakOne8QST/sxLlqvNcir7OkiyS9rxexdMJE4pVkSS/qdkxLmlT0iyFJb5c0s1RkcyWdK+kVHTj1\nQcBFtle2/R/AW4DnAavZ3s32ibZfO95JbH/FdtsVkaRp5T/+0hM91vafbK9k+6kW3veVkuZM9LgJ\nnP8Hkp4o/373SbpA0otH9te9zl2I65ByvT88avuBZfshvY4pOiMV/WJG0keBbwJfoaqEpwLfBnbu\nwOnXAa4f9foPthd04NyxsCNsrwSsBfwZ+F6f4xnxB2CvUdv2LNtjMZWKfjEi6bnAl4D9bJ9u+xHb\nT9r+me1PlDLLSfqmpDvL45uSlms4xz9LukrSA5J+K2njsv2XwKuAb5U7zZOAzwNvLa/fK2lvSZc0\nnGvDcjd6n6S7JB1cti/UzCHpZeW9HpB0taRXNuy7SNKXJf2mNBn9XNKksvvX5ecDJYatJb1I0sWl\nOekeSTPGuFYLfRsY530aj1sROBdYs7znw5LWLLuXlXRCOf56SdMbjltT0mmS5km6bfRd8Vhs/xU4\nBdi04Vyjr/NrJN1YPvO3ADXsW0rSkeVa3CZp/1Gf+7mSvle++f1Z0qHjNGf9L7CCpA3L8RsCf1e2\nN16nfSTdXP7tz2y4RouMt+x/j6TZku6XdL6kdepcq2hdKvrFy9bA8sAZiyjzGeBlVBXHJsCWwGcB\nJG0OHAe8H1gNOBo4U9JytrcH/gfYvzR57EH1rWFGeb3QHaeklYFfAOcBawIvAi4cHYyktYCzgUOB\nfwA+DpwmaXJDsbcD7wZWB5YtZQC2LT9XLTH8Dvgy8HPg74G1gf+/iGsx2ljv8wzbjwBvAO4s77mS\n7TvL7p2Ak4FVgTOBb5XP+BzgZ8DVVHfoOwAHSnrdeAGVPyx7ADePsX8ScBrVv+Ek4Bbg5Q1F9inx\nbgpsDuwy6hTHAwuo/n02A14LjNes9t9Ud/FQ3d2fMCqm7YF/A3YHng/8keq6jBuvpF2Ag4FdgclU\nv3MnjRNPtCkV/eJlNeCecZpS3gF8yfbdtucBXwTeVfbtAxxt+zLbT9k+Hnic6g/DRP0z8BfbR9p+\nzPZ825c1KfdO4Bzb59h+2vYFwExgx4Yy37f9h2Z3t008SdWktGZ530sWUXa0ibxPM5eUz/EUVWW4\nSdm+BTDZ9pdsP2H7VuC/gLct4lwfl/QAMB94Bc/+G422I3CD7VNtP0nVbPeXhv27A0fZnmP7fuCr\nIzskPY/qj8CB5dvf3cA3xokL4IfAHpKWKWVHd0K/AzjO9izbjwOfBraWNK1GvO8H/s327PJ7/BVg\n09zVd1cq+sXLvcAkLbpzck2qO6wRfyzboKogP1aaUB4oFc2Uhv0TMYXqbm086wC7jXrPV1DdCY5o\nrAgeBVZaxPkOomoKuLw0n7xnAjFP5H3qHL98+bdYh6qpp/EzHkzVhzKWf7e9KjAN+CuwwRjl1gTu\nGHnhKgvhHWPtH/V8HWAZYG5DXEdTfaMZk+0/UX3D+Apwk+07RhVZ6HfM9sNUv5tr1Yh3HeCohnju\no/r3XGtRMUV7JjyaIfrqd8BjVF/PTx2jzJ0s3Kk6tWyD6j/cYbYP60Asd1A1OdQp99+292nhPf4m\ntartv1B9M0HVSKNfSPq17aZNHy2aaErXO4DbbK834Tey/yTpAOB4SWeVbxuN5lL9UQVAkhpfl/1r\nN7xu3HcH1Te2SS10qJ9A1cz37ib7Rn7HRmJakerb5p9rxDvyO3jiBOOJNuSOfjFi+0GqDtL/lLSL\npBUkLSPpDZKOKMVOAj4raXJpL/08z371/i/gA5K2UmVFSW8s7e0TdRawhqqhd8tJWlnSVk3K/RB4\nk6TXlY7D5VUNX1y7SdnR5gFPA+uObJC0W8Ox91NVyhMeQjmOu4DVVHV+13E58JCkT6qae7CUpI0k\nbVHn4NKcdSewb5PdZwMbStq1fHv4MLBGw/5TgAMkrSVpVeCTDeedS9WfcaSkVSQ9R9ILJW1XI6wZ\nVO35pzTZ9yPg3ZI2VdXR/xXgMtu314j3u8CnGzp7nytptxrxRBtS0S9mbH8d+ChVZ9c8qjuk/YGf\nlCKHUrWBXwNcC8wq27A9k+pu+FtUleTNwN4txjEfeA3wJqomjZuoRu2MLncH1dDPgxvi/QQ1fvds\nPwocBvymfNV/GVV7+GWSHqbqED3A9m2tfIZFvO+NVH8wby3vu8imrdJm/yaqNv/bgHuAY4G6fygA\nvgYcpIYRUuXc9wC7UbW93wusB/ymoch/UVXm1wBXAudQdb6O/PHbk6rj+Qaqf/NTWbjZbKzP9Ffb\nv2jyDQPbFwKfo+p0nQu8kNLuP168ts8ADgdOlvQQcB1VP0J0kbLwSMTwkPQG4Lu207kZz8gdfcRi\nrDQV7Shp6TKU9QssevhtLIFyRx+xGJO0AnAx8GKq0TtnUzVnPdTXwGKgpKKPiBhyabqJiBhyAzGO\nftKkSZ42bVq/w4iIWKxcccUV99iePF65gajop02bxsyZM/sdRkTEYkXSH8cvlaabiIihN+4dvaQp\nVNOh16CapXiM7aMkbUo1y215qgka/2r78nLMK6mSGS1DlYSrzky8WAxN+9TZ/Q4hYrF2+1ff2PX3\nqNN0swD4mO1ZZar8FZIuAI4Avmj7XEk7ltevLNOwvw28vuTxWGQCpYiI6K5xK/qSL2NueT5f0myq\nTHMGVinFnsuzibPeDpxeMuBRUqNGRESfTKgztuSb3gy4DDgQOF/Sv1O19f9TKbY+sIyki4CVqXJl\nn9DkXPtSkjhNnTq1tegjImJctTtjJa1ElcTowDLr7oPAR2xPAT7Cs2teLg28FHgj8Drgc5LWH30+\n28fYnm57+uTJ444OioiIFtWq6MtKM6cBJ9o+vWzeCxh5/mOqJesA5gDnlRVt7qFa93MTIiKiL+qM\nuhHV3frskiJ3xJ3AdsBFwPZUaWoBfkq1wPTSVOlRt6JaviyGUC9GDEREe+q00b+caj3LayVdVbYd\nTJXX/KhSoT9GaW+3PVvSeVT5sZ8GjrV9Xccjj4iIWuqMurmEak3HZl46xjFfo1pIISIi+iwzYyMi\nhlwq+oiIIdfxFAil8/YoYEfgUWBv27O69QGiv5ICIZYEi/ugg46nQKBa6He98tgK+E75GRERfTBu\n043tuSN35LbnA+OlQNgZOMGVS4FVJY276nxERHRHN1IgrAXc0XDYnLJt7qhzJQVCREQPdCMFQrOh\nmH+zMG1SIERE9EatO/pFpEA4oDz/MXBseT4HmNJw+No826wTQ2Zx76SKWBKMe0dfIwUCLJwC4Uxg\nT1VeBjxYUh1HREQfdDwFAnAO1dDKm6mGV767oxFHRMSEdDwFgm0D+7UZV0REdEhmxkZEDLk6bfRT\nJP1K0mxJ10s6oGyfIemq8ri9oVkHSZ+WdLOk30t6XTc/QERELFrLM2Ntv3WkgKQjgQfL85cAbwM2\nBNYEfiFpfdtPdT786LekQFgyZbTV4qWdmbHAM6NydgdOKpt2Bk62/bjt26g6ZbckIiL6YkJt9KNm\nxo7YBrjL9sjwyrFmxkZERB+0MzN2xB48ezcPNWfGStpX0kxJM+fNm1c3jIiImKB2FgenjKHfFZjR\nULzWzNikQIiI6I12FgcHeDVwo+05DdvOBH4k6etUnbHrAZd3KN4YMOmUixh8Lc+MtX0O1eiaxmYb\nbF8v6RTgBqoRO/tlxE1ERP+0NTPW9t5jbD8MOKytyCIioiMyMzYiYsiloo+IGHJtVfSSVpV0qqQb\nS4qErSVtKunSkhphpqRMloqI6KMJLSXYxFHAebbfImlZYAXgFJovGh5DKCkQFl8ZMbXkaLmil7QK\nsC2wN4DtJ4AnJI21aHhERPRBO3f06wLzgO9L2gS4gmppwbEWDV9IFgePiOiNdtrolwY2B75jezPg\nEeBTjL1o+EIyMzYiojfaqejnAHNsjyQ4O5Wq4t8LGEmT8GOSuTIioq9abrqx/RdJd0jawPbvgR2o\nZsOuS7Vo+EUsvGh4DKF06EUMvnZH3XwIOLGMuLmVaiHwn9J80fCIiOiDtip621cB00dtvoQmi4ZH\nRER/ZGZsRMSQS0UfETHkulbRSzpA0nWSrpd0YLfeJyIiFq3dztimJG0E7EM1tPIJ4DxJZzesKxtD\nIikQ+iujnqKObt3R/yNwqe1HbS8ALgbe3KX3ioiIRehWRX8dsK2k1SStAOzIwuvIZnHwiIge6UpF\nb3s2cDhwAXAecDXVsoKNZZICISKiB7rWGWv7e7Y3t70tcB+ZIRsR0Rdd6YwFkLS67bslTQV2Bbbu\n1ntF/6QzMGLwda2iB06TtBrwJLCf7fu7+F4RETGGrlX0trfp1rkjIqK+zIyNiBhyqegjIoZcW003\nklYFjgU2Agy8B/g9MAOYBtwO7J72+eGVmbG9lc7vaEW7d/RHAefZfjGwCTCbajnBC22vB1xYXkdE\nRJ+0XNFLWgXYlrImrO0nbD8A7AwcX4odD+zSbpAREdG6du7o1wXmAd+XdKWkYyWtCDzP9lyA8nP1\nZgcnBUJERG+0U9EvTbUY+HdsbwY8wgSaaZICISKiN9qp6OcAc2xfVl6fSlXx3yXp+QDl593thRgR\nEe1oedSN7b9IukPSBrZ/D+wA3FAeewFfLT9/2pFIYyBlFEjE4Gt3ZuyHgBMlLQvcCryb6lvCKZLe\nC/wJ2K3N94iIiDa0VdHbvgqY3mTXDu2cNyIiOiczYyMihlwq+oiIIVer6UbSccA/A3fb3qhs+zLV\n5KinqUbW7G37zoZjtgAuBd5q+9ROBx6DISkQWpeO7OiVunf0PwBeP2rb12xvbHtT4Czg8yM7JC1F\ntZTg+Z0IMiIiWlerorf9a6rlABu3PdTwckWqpGYjPgScRsbQR0T0XbvZKw8D9gQeBF5Vtq0FvBnY\nHthiEcfuC+wLMHXq1HbCiIiIRWirM9b2Z2xPAU4E9i+bvwl80vZT4xybFAgRET3QqaUEfwScDXyB\nalz9yZIAJgE7Slpg+ycdeq+IiJiAlit6SevZvqm83Am4EcD2CxrK/AA4K5X88MrIkYjBV3d45UnA\nK4FJkuZQ3bnvKGkDquGVfwQ+0K0gIyKidbUqett7NNn8vRrH7T3RgCIiorMyMzYiYsiloo+IGHLj\nNt1ImgKcAKxB1R5/jO2jGvZ/HPgaMNn2PZJ2Br5cyi4ADrR9STeCj/5LCoSFpXM6BlGdNvoFwMds\nz5K0MnCFpAts31D+CLyGKu/8iAuBM21b0sbAKcCLOx55RETUMm7Tje25tmeV5/OB2cBaZfc3gINo\nSH9g+2HbI69Hp0aIiIgem1AbvaRpwGbAZZJ2Av5s++om5d4s6UaqSVTvGeNc+0qaKWnmvHnzJhx4\nRETUU7uil7QSVaKyA6macz5DQ8bKRrbPsP1iYBeq9vpmZZICISKiB2pV9JKWoarkT7R9OvBC4AXA\n1ZJuB9YGZklao/G4kvXyhZImdTTqiIiorc6oG1FNjppt++sAtq8FVm8oczswvYy6eRFwS+mM3RxY\nFri3G8FH/2WUScTgqzPq5uXAu4BrJV1Vth1s+5wxyv8LsKekJ4G/Uq0wlQ7ZiIg+GbeiL2PgNU6Z\naQ3PD6daXSoiIgZAZsZGRAy5VPQREUOu5RQIknYDDgH+EdjS9sxS/jXAV6k6YZ8APmH7l90JP/ot\nKRCelY7pGFQtp0AArgN2BY4eVf4e4E2275S0EXA+z86kjYiIHqvTGTsXmFuez5c0G1jL9gUAZcnA\nxvJXNry8Hlhe0nK2H+9Y1BERUVvLKRBqHvIvwJXNKvmkQIiI6I2WUiDYfqhG+Q2phlm+v9n+pECI\niOiNumvGjk6BMF75tYEzgD1t39JeiDHI0gEZMfjGvaNvlgJhnPKrUmWt/LTt37QfYkREtKNO081I\nCoTtJV1VHjuWVMRzgK2BsyWdX8rvD7wI+FxD+dXHOHdERHRZuykQzmhS/lDg0DbjioiIDsnM2IiI\nIVenjX6KpF9Jmi3pekkHlO2bSPqdpGsl/UzSKg3HbFz2XV/2L9/NDxEREWNrZ2bsscDHbV8s6T3A\nJ6ja5ZcGfgi8y/bVklYDnuzWB4j+SgqESkYfxSBrZ3HwDYBfl2IXUE2OAngtcM3IWrK277X9VKcD\nj4iIetqZGXsdsFPZtRswpTxfH7Ck8yXNknRQZ0KNiIhWtDMz9j3AfpKuAFamylQJVXPQK4B3lJ9v\nlrRDk/MlBUJERA+0ujg4tm+0/VrbLwVOAkZmwM4BLrZ9j+1HgXOAzUefMykQIiJ6o6XFwcv21W3f\nLek5wGeB75Zd5wMHSVqB6i5/O+AbHY88BkI6ISMGX8uLgwPrSdqvvD4d+D6A7fslfR34X8DAObYz\nNCMiok/anRl71BjH/JBqiGVERPRZZsZGRAy5VPQREUOunRQIMxqyU97e0H6fFAgREQOk5RQItt86\nUkDSkcCD5fkSmQJhSU0FkFE3EYOv5cXBgRvgmeGXuwPbl0P+JgVCF+KOiIiaOrE4+DbAXbZvKq+T\nAiEiYoDUWjMWFrk4+B5UM2Mbz/kKYAvgUeBCSVfYvnDU+fYF9gWYOnVqa9FHRMS4Wk6BULYvDewK\nzGgonhQIEREDpOUUCMWrgRttz2nYtkSmQEinZEQMqpYXBy/73sbCzTbYvh8YSYFwFTArKRAiIvqn\nrRQItvceY3tSIEREDIjMjI2IGHKp6CMihlydFAjLS7pc0tUlpcEXy/YTJf1e0nWSjisjc1DlPyTd\nLOkaSX8z4iYiInqnzjj6x4HtbT9cKvNLJJ0LnAi8s5T5EfA+4DvAG4D1ymOrsm2rTgc+aJICISIG\n1bh39K48XF4uUx62fU7ZZ+ByYO1SZmfghLLrUmBVSc/vRvARETG+uhOmlirZKe8GLrB9WcO+ZaiG\nX55XNq0F3NFw+JyybfQ5szh4REQP1KrobT9le1Oqu/YtJW3UsPvbwK9t/0953WwoppucMzNjIyJ6\nYEKjbmw/AFwEvB5A0heAycBHG4rNAaY0vF4buLOtKCMiomV1UiBMBp60/YCkv6NKe3C4pPcBrwN2\nsP10wyFnAvtLOpmqE/bBkup4qKVTMiIGVZ1RN88Hjpe0FNU3gFNsnyVpAfBH4HdVOhxOt/0lqiRm\nOwI3U2WvfHdXIo+IiFrqpEC4hioH/ejtTY8to3D2az+0iIjohMyMjYgYcqnoIyKGXJ0UCMdJulvS\ndQ3bNpV0aUlZPFPSlmX7O0rag2sk/VbSJt0MPiIixlenM/YHwLeAExq2HQF80fa5JTf9EcArgduA\n7WzfL+kNwDEMYfqDJTXdQTMZbRQx+Op0xv66LAq+0GZglfL8uZRx8rZ/21DmUp5NixAREX1Se3Hw\nUQ4Ezpf071TNP//UpMx7gXPHOkEWB4+I6I1WO2M/CHzE9hTgI1Rryj5D0quoKvpPjnWCpECIiOiN\nViv6vYDTy/MfA1uO7JC0MXAssLPte9sLLyIi2tVq082dwHZUeW+2B24CkDSV6g/Au2z/oRMBDqJ0\nQEbE4qROrpuTqEbUTJI0B/gCsA9wlKSlgccobe3A54HVgG+XtAgLbE/vQtwREVFTnVE3e4yx66VN\nyr6PaqWpiIgYEJkZGxEx5FLRR0QMuTpt9FOoZsWuATwNHGP7qLLvQ8D+wALgbNsHSVoWOBqYXsof\nYPui7oTfO5kN21w6piMGX51RNwuAj9meJWll4ApJFwDPo1oIfGPbj0tavZTfB8D2/yvbzpW0xajF\nSSIiokfGbbqxPdf2rPJ8PjCbarHvDwJftf142Xd3OeQlwIUN2x6guruPiIg+mFAbfcl5sxlwGbA+\nsI2kyyRdLGmLUuxqYGdJS0t6AdXonClNzrVvyXw5c968ee18hoiIWITaE6YkrQScBhxo+6Eyhv7v\ngZcBWwCnSFoXOA74R2Am1VKDv6Vq/lmI7WOoslsyffp0t/k5IiJiDLUqeknLUFXyJ9oeSX0wh2qd\nWAOXS3oamGR7HlX+m5Fjf0uZORsREb1XZ9SNqJKWzbb99YZdP6FKf3CRpPWBZYF7JK0AyPYjkl5D\nNTv2hi7E3lMZXRIRi6s6d/QvB94FXCvpqrLtYKommuPKylNPAHvZdhlpc365w/9zOTYiIvqkTgqE\nSwCNsfudTcrfDmzQXlgREdEpmRkbETHkUtFHRAy5llMgSJrBs000qwIP2N5U0mrAqVRDLn9ge//u\nhN4bSX2waOmkjhh8LadAsP3WkQKSjgQeLC8fAz4HbFQeERHRR+2kQACeGX65O3BSKfNI6cB9rCsR\nR0TEhLSTAmHENsBdtic0KSopECIieqN2RT86BULDrj0od/MTYfsY29NtT588efJED4+IiJraSYFA\nyXezK02WFYyIiMHQTgoEgFcDN9qe043gBkFGlUTE4q7lFAi2zwHeRpNmG0m3A6sAy0raBXjtMOS7\niYhYHLWVAsH23mNsn9ZWVBER0TGZGRsRMeRS0UdEDLnaK0w1U9ri5wNPUeWdny5pN+AQqlWmtrQ9\ns90geyXpDiYundURg6+tir54le17Gl5fRzXk8ugOnDsiItrUiYp+IbZnA1SjMiMiot/abaM38HNJ\nV0jadyIHJgVCRERvtFvRv9z25sAbgP0kbVv3wKRAiIjojbYqett3lp93A2cAW3YiqIiI6JyW2+gl\nrQg8x/b88vy1wJc6FlkfZARJRAyjdu7onwdcIulq4HLgbNvnSXqzpDnA1sDZks7vRKAREdGalu/o\nbd8KbNJk+xlUzTgRETEAMjM2ImLIpaKPiBhydfLRTwFOANYAngaOsX2UpBnABqXYqsADtjeV9Brg\nq8CywBPAJ2z/sivRd0hSH7QuHdgRg69OG/0C4GO2Z0laGbhC0gW23zpSQNKRwIPl5T3Am2zfKWkj\n4HwaFhOPiIjeqpOPfi4wtzyfL2k2VcV9AzyzAtXuwPalzJUNh18PLC9pOduPdzj2iIioYUJt9JKm\nAZsBlzVs3ga4y/ZNTQ75F+DKZpV8UiBERPRG7Ype0kpUC4QfaPuhhl170Hw5wQ2Bw4H3NztfUiBE\nRPRGrXH0kpahquRPtH16w/alqVISv3RU+bWpxtLvafuWzoXbHelQjIhhNu4dfWmD/x4w2/bXR+1+\nNXCj7TkN5VcFzgY+bfs3nQw2IiImrk7TzcuBdwHbS7qqPHYs+97G3zbb7A+8CPhcQ/nVOxdyRERM\nRJ1RN5cATVcRsb13k22HAoe2HVlERHREZsZGRAy5Om30UyT9StJsSddLOqBsn9HQNHO7pKvK9mmS\n/tqw77vd/hARETG2bsyMBbjF9qYdjnVCktagNzJiKWLwdXxmbEREDJZuzYx9gaQrJV0saZu2o4yI\niJbVXnhkAjNj5wJTbd8r6aXATyRtOOoYJO0L7AswderUVuOPiIhx1LqjrzEzdsbINtuP2763PL8C\nuAVYf/Q5kwIhIqI36uSjn+jM2MnAfbafkrQusB5wawdjriWdhBERlW7MjN0WuKYsGn4q8AHb93Us\n4oiImJBuzIw9jaqZJyIiBkBmxkZEDLlU9BERQ67jKRAajpsq6WFJH+9W8BERMb5upUAA+AZwbudC\nXbSkPOiPjG6KGHxdSYEgaReqIZWPdCHmiIiYgI6nQJC0IvBJ4IvjnCuLg0dE9EA3Fgf/IvAN2w8v\n6nyZGRsR0RvdWBx8K+Atko4AVgWelvSY7W91LuyIiKir4ykQbG/TcOwhwMO9qOTTKRgR0Vw3UiBE\nRMQA6XgKhFH7D2kpqoiI6JjMjI2IGHKp6CMihlw7KRC+LOma0mb/c0lrjjpuC0lPSXpLt4KPiIjx\ntZwCAfia7c8BSPow8HngA+X1UsDhwPndCXthSX/QPxntFDH4xr2jtz3X9qzyfD4wG1hr1KSpFQE3\nvP4Q1bj7uzsYa0REtKD24uDwtykQJB0G7EmV0OxVZdtawJupct9ssYhzZXHwiIgeaCsFgu3P2J4C\nnAjsX4p+E/ik7acWdb6kQIiI6I22UiA0+BFwNvAFYDpwcjWhlknAjpIW2P5JZ0KOiIiJaDkFgqT1\nRjJWAjsBNwLYfkFDmR8AZ3W7kk+HYETE2Orc0Y+kQLi2YRWpg4H3StoAeBr4I2XETUREDJZ2UiCc\nU+PYvVuIKSIiOki2xy/V7SCkeVTfCgbZJOCefgcxgHJdmst1GVuuTXOtXJd1bI87mmUgKvrFgaSZ\ntqf3O45Bk+vSXK7L2HJtmuvmdUmum4iIIZeKPiJiyKWir++YfgcwoHJdmst1GVuuTXNduy5po4+I\nGHK5o4+IGHKp6CMihlwq+jFI+gdJF0i6qfz8+zHKnSfpAUln9TrGXpL0ekm/l3SzpE812b+cpBll\n/2Ul0+nQq3FdtpU0S9KCJWkRnhrX5aOSbiiLF10oaZ1+xNkPNa7NByRdWxZ1ukTSS9p+U9t5NHkA\nRwCfKs8/BRw+RrkdgDdR5fTpe9xduhZLAbcA6wLLAlcDLxlV5l+B75bnbwNm9DvuAbku04CNgROA\nt/Q75gG6Lq8CVijPP7gk/L5M4Nqs0vB8J+C8dt83d/Rj2xk4vjw/HtilWSHbFwLzexVUn2wJ3Gz7\nVttPACdTXZ9GjdfrVGCHkhBvmI17XWzfbvsaqpxQS4o61+VXth8tLy8F1u5xjP1S59osalGnlqSi\nH9vzbM+FapUtYPU+x9NPawF3NLyeU7Y1LWN7AdViNKv1JLr+qXNdlkQTvS7vBc7takSDo9a1kbSf\npFuoWhY+3O6bTmiFqWEj6RcUgXvHAAABbElEQVTAGk12fabXsQy4Znfmo+8y6pQZNkviZ66j9nWR\n9E6qNSy262pEg6PWtbH9n8B/Sno78Flgr3bedImu6G2/eqx9ku6S9HzbcyU9nyV7/ds5wJSG12sD\nd45RZo6kpYHnAvf1Jry+qXNdlkS1roukV1PdVG1n+/EexdZvE/2dORn4TrtvmqabsZ3Js39F9wJ+\n2sdY+u1/gfUkvUDSslSdrWeOKtN4vd4C/NKlN2mI1bkuS6Jxr4ukzYCjgZ1sL0k3UXWuzXoNL98I\n3ES7+t0LPagPqvblC8tFvhD4h7J9OnBsQ7n/AeYBf6X6a/26fsfepeuxI/AHqhEDnynbvkT1HxVg\neeDHwM3A5cC6/Y55QK7LFuX34hHgXuD6fsc8INflF8BdwFXlcWa/Yx6ga3MUcH25Lr8CNmz3PZMC\nISJiyKXpJiJiyKWij4gYcqnoIyKGXCr6iIghl4o+ImLIpaKPiBhyqegjIobc/wHlay5jv2n74wAA\nAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x102a01690>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#作业这一步不是必须\n",
    "print(\"Try again for more precision with alphas centered around \" + str(alpha))\n",
    "ridge = RidgeCV(alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, alpha * .85, \n",
    "                          alpha * .9, alpha * .95, alpha, alpha * 1.05, alpha * 1.1, alpha * 1.15,\n",
    "                          alpha * 1.25, alpha * 1.3, alpha * 1.35, alpha * 1.4], \n",
    "                store_cv_values=True)\n",
    "ridge.fit(X_train, y_train)\n",
    "alpha = ridge.alpha_\n",
    "print(\"Best alpha :\", alpha)\n",
    "\n",
    "mse_cv = np.mean(ridge.cv_values_, axis = 0)\n",
    "rmse_cv = np.sqrt(mse_cv)\n",
    "print(\"cv of rmse :\", min(rmse_cv))\n",
    "\n",
    "y_train_rdg = ridge.predict(X_train)\n",
    "rmse_train = np.sqrt(mean_squared_error(y_train,y_train_rdg))\n",
    "print(\"RMSE on Training set :\", rmse_train)\n",
    "\n",
    "\n",
    "# Plot important coefficients\n",
    "coefs = pd.Series(ridge.coef_, index = X_train.columns)\n",
    "print(\"Ridge picked \" + str(sum(coefs != 0)) + \" features and eliminated the other \" +  \\\n",
    "      str(sum(coefs == 0)) + \" features\")\n",
    "\n",
    "#正系数值最大的10个特征和负系数值最小（绝对值大）的10个特征\n",
    "imp_coefs = pd.concat([coefs.sort_values().head(10),\n",
    "                     coefs.sort_values().tail(10)])\n",
    "imp_coefs.plot(kind = \"barh\")\n",
    "plt.title(\"Coefficients in the Ridge Model\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "_cell_guid": "2da33461-fb1a-c09a-b0fe-d20e8c8f52e3",
    "_uuid": "d2e8bab153344d6ecf2e48909b76bd8f387e2710"
   },
   "source": [
    "**3* Linear Regression with Lasso regularization (L1 penalty)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "_cell_guid": "8525724a-fb77-66d4-e06f-f3365fbd8ec3",
    "_uuid": "f8f41944e405693ba867158353ffedcd9ecd4a07",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/qing/anaconda2/lib/python2.7/site-packages/sklearn/linear_model/coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.\n",
      "  ConvergenceWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Best alpha :', 0.001)\n",
      "Try again for more precision with alphas centered around 0.001\n",
      "('Best alpha :', 0.0010500000000000002)\n",
      "('cv of rmse :', 0.28353734545829701)\n",
      "('RMSE on Training set :', 0.27531394757319033)\n",
      "Lasso picked 129 features and eliminated the other 214 features\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XmcXFWd/vHPw46yKQmyJURHwFGE\noGEbZGcGjAjqwAAqEn8CgqCAOsKIgqDMDKhIZnAXBlFkB0VWoyMijsAkGJYYlF0yRBLAsIgsCc/v\nj3vaVJrq7uquLV153q9XvVJ17rn3fqu6c/rWued8j2wTERG9a7luBxAREe2Vhj4ioseloY+I6HFp\n6CMielwa+oiIHpeGPiKix6Whj7+StKmk30h6WtLHJK0q6ceSnpR0iaT3SfpJA8f5tKTvdCLmQWIY\nL+kZScu36Hifk/T9VhxrWSPpQUm7N1BvgiRLWqETcS1L0tCPQpLeK2l6acjmSrpW0ttacOhPATfY\nXt32fwD7Aq8B1ra9n+3zbf/DUAex/a+2D2k2mGb+49v+g+3VbC8awXl3ljRnuPsN4/jnSvpCu44/\nUiUuS9q7X/mZpXxKl0KLJqWhH2UkfRw4E/hXqkZ4PPA1YJ8WHH4jYFa/17+3vbAFx47R4ffAwX0v\nyh/Z/YD7uhZRNC0N/SgiaU3gFOBI25fb/rPtF23/2PY/lzorlyuwR8rjTEkr1xxjL0kzJS2Q9D+S\nNi/l/w3sApxVvilcAJwI7F9ef0jSFEk31RzrTZKmSXpC0qOSPl3Kl+jmkLRtOdcCSbdL2rlm2w2S\nPi/pV6XL6CeSxpTNN5Z/F5QYtpP0ekm/KN1Jj0m6aIDPaolvA0Ocp3a/VwLXAuuXcz4jaf2yeSVJ\n55X9Z0maVLPf+pIukzRf0gOSPtbIz7TO+adKeljSU5JmSNqhZtvW5ZvcU+XzPqOUryLp+5IeL5/x\n/0p6TU1cV5af0b2SDh0ihB8D20t6VXm9J3AH8MeaOJaT9BlJD0maVz6TNWu2H1S2PS7phH7vbzlJ\nx0u6r2y/WNKrR/JZRePS0I8u2wGrAFcMUucEYFtgIrAFsDXwGQBJbwHOAT4MrA18E7hS0sq2dwV+\nCRxVujwOpPrWcFF5fXbtSSStDvwUuA5YH3g98LP+wUjaALga+ALwauCTwGWSxtZUey/wQWAdYKVS\nB2DH8u9aJYZfA58HfgK8CtgQ+M9BPov+BjrPX9n+M/B24JFyztVsP1I27w1cCKwFXAmcVd7jclQN\n5O3ABsBuwDGS9hhGbH3+l+pn92rgB8AlklYp26YCU22vAfwNcHEpPxhYExhH9XM9HPhL2XYBMIfq\nZ7Qv8K+Sdhvk/M+V93ZAef0B4Lx+daaUxy7A64DVWPxZvBH4OnBQOefaVD+nPh8D3gXsVLb/Cfjq\nIPFEC6ShH13WBh4boivlfcAptufZng+cTPWfDuBQ4Ju2b7G9yPZ3geep/jAM117AH21/2fZztp+2\nfUudeu8HrrF9je2XbE8DpgOTa+r8l+3f2/4LVeM1cZDzvkjVpbR+Oe9Ng9Ttbzjnqeem8j4WAd+j\n+kMKsBUw1vYptl+wfT/wbRY3lg2z/X3bj9teaPvLwMrApmXzi8DrJY2x/Yztm2vK1wZeX36uM2w/\nJWkc8DbguPJZzQS+w+Lfh4GcB3ygXKXvBPyw3/b3AWfYvt/2M8C/AAeUb0/7AlfZvtH288BngZdq\n9v0wcILtOWX754B9lRuwbZWGfnR5HBgzxH+K9YGHal4/VMqgaiA/Ub7eL5C0gOoqcH2GbxyN9dtu\nBOzX75xvA9arqfPHmufPUl0hDuRTgIBbS/fJ/xtGzMM5TyP7r1J+FhtRdfXUvsdPU91DGRZJn5A0\nu3RNLaC6Uu/rYvoQsAlwd+me2auUfw+4HrhQVXfd6ZJWpPq5PmH76ZpTPET1rWNA5Y/nWKpvgleV\nP4y16v2OrVDe7/rAwzXH+jPV722fjYAraj6n2cAiRvBZRePyV3R0+TXVV+t3AZcOUOcRlrypOr6U\nQfUf8FTbp7YgloeBAxus9z3bQ/UN1/Oy1Kq2/0j1zQRVI41+KulG2/eO4PgNn3cIDwMP2N64mZOW\n/vjjqLp+Ztl+SdKfqP6wYfse4MDSVfQe4FJJa5fG9GTgZEkTgGuA31F1cb1a0uo1jf144P8aCOf7\nVPdodqmzre93rM94YCHwKDAX+Nua9/QKqm8bfR4G/p/tX9V5/xMaiCtGIFf0o4jtJ6n+831V0rsk\nvULSipLeLun0Uu0C4DOSxpabjSdS/aeFqjvhcEnbqPJKSe8o/e3DdRWwrqRjVN0AXl3SNnXqfR94\np6Q9JC1fbhzuLGnDOnX7m0/1tf91fQWS9qvZ909UjfKwh1AO4VFg7dobjEO4FXhK0nGq5h4sL2kz\nSVsNsk/fZ9H3WAlYnarBnA+sIOlEYI2+HSS9X9JY2y8BC0rxIkm7SHqzqjkDT1F15Syy/TDwP8C/\nlXNsTvWt4PwG3tN/AH/P4hvitS4AjpX0WkmrsfhezkKqC5C9JL2tvKdTWLKd+QZwqqSNynsaK6kV\nI8ZiEGnoRxnbZwAfp/paPZ/qCukoFvejfoGqD/wO4E7gtlKG7elUV8NnUTWS91LdVBtJHE9TNQTv\npOrSuIc6V3+lsdmHqiujL95/poHfPdvPAqcCvypf9bel6g+/RdIzVDcNj7b9wEjewyDnvZuqMbu/\nnHfQrq3SZ/9Oqj7/B4DHqPrCB/tDcTzVDdO+x39Tdb9cSzXE8SGqb28P1+yzJzCrvPepwAG2nwPW\npWpgn6LqCvkFi/+4HwhMoLoKvwI4qdwnGeozeML2z1x/wYpzqLqLbizv9zngo2W/WcCRVDeS51L9\nntXOSZhK9XP7iaSngZuBehcI0ULKwiMREb0tV/QRET0uDX1ERI9LQx8R0ePS0EdE9LilYhz9mDFj\nPGHChG6HERExqsyYMeMx22OHqrdUNPQTJkxg+vTp3Q4jImJUkfTQ0LXSdRMR0fOGvKIviZHOo5qU\n8RLwLdtTJU2kmuW2CtVsvo/YvrXsszNVzvQVqZJw7dSe8KPTJhx/dbdDiOgpD/77O9p+jka6bhYC\nn7B9W5kqP0PSNOB04GTb10qaXF7vLGktqoUw9rT9B0nrtC36iIgY0pANve25VFOZsf20pNlU2e/M\n4jwca7I4cdZ7gctt/6HsM6/VQUdEROOGdTO2ZJfbErgFOAa4XtKXqPr6/65U2wRYUdINVEmaptru\nv3ABkg4DDgMYP378yKKPiIghNXwztmSpuww4xvZTwBHAsbbHAccCfSsQrQC8FXgHsAfwWUmb9D+e\n7W/ZnmR70tixQ44OioiIEWqooS+LGFwGnG/78lJ8MND3/BKqJeugylR3XVnP9DGqDHdbEBERXdHI\nqBtRXa3PLily+zxCtczYDcCuVGlqAX5EtcD0ClTrcm4DfKWFMUcXdWKEQES0ViN99NtTrTF5p6SZ\npezTVHnNp5YG/TlKf7vt2ZKuo8qH/hLwHdt3tTzyiIhoSCOjbm6iLGVWx1sH2OeLwBebiCsiIlok\nM2MjInpcGvqIiB7X8hQI5ebtVGAy8CwwxfZt7XoD0VlJgRCjRQYOLNbyFAjA24GNy2Mb4Otk8d+I\niK4ZsuvG9ty+K3LbT1OtMj9YCoR9gPNcuRlYS9J6LY88IiIa0o4UCBsAD9fsNqeUze13rKRAiIjo\ngHakQKg3FNMvK0gKhIiIjmjoin6QFAhHl+eXAN8pz+cA42p235DF3ToxyuUGV8ToM+QVfQMpEGDJ\nFAhXAh9QZVvgyZLqOCIiuqDlKRCAa6iGVt5LNbzygy2NOCIihqXlKRBsGziyybgiIqJFMjM2IqLH\nNdXQS1pL0qWS7pY0W9J2kiZKulnSTEnTJW099JEiIqJdhjWOvo6pVIuM7CtpJeAVwMXUnzEbPSAp\nEHpbRlX1phE39JLWAHYEpgDYfgF4QdJAM2YjIqILmrmifx0wH/gvSVsAM6jG1Q80YzYiIrqgmT76\nFYC3AF+3vSXwZ+B4Bp4xuwRJh5U+/Onz589vIoyIiBhMMw39HGCO7VvK60upGv6BFg1fQlIgRER0\nxoi7bmz/UdLDkja1/TtgN+C3VF069RYNjx6Qm3URo0+zo24+CpxfRtzcTzUL9kfUnzEbERFd0FRD\nb3smMKlf8U0MsGh4RER0XmbGRkT0uDT0ERE9rpE0xeMk/bykOJgl6ehSflFJczBT0oM1mS2R9C+S\n7pX0O0l7tPMNRETE4Ea8OLjt/fsqSPoy8GR5/kbgAOBNwPrATyVtYntR68OPTksKhN6VEVW9q5nF\nwYG/LkzyT8AFpWgf4ELbz9t+gCovfRKbRUR0ybD66PstDt5nB+BR233j5QdaHLz/sTIzNiKiA5pZ\nHLzPgSy+mocsDh4RsVRpZnFwyqSo97DkuPksDh4RsRQZsqEfZHFwgN2Bu23PqSm7EviBpDOobsZu\nDNzaonijy3LDLmL0GfHi4LavoRpdU9ttg+1Zki6mynuzEDgyI24iIrqnqcXBbU8ZoPxU4NSmIouI\niJbIzNiIiB6Xhj4iosc11dBLOlrSXSU1wjGlbMDUCBER0XnNLA6+GXAo1azXF4DrJF09UGqE6A1J\ngbCkjEKK0aCZK/q/BW62/azthcAvgHf3bayTGiEiIrqgmYb+LmBHSWtLegUwmSUnSvVPjbCEpECI\niOiMETf0tmcDpwHTgOuA26nGzffpnxqh//5JgRAR0QFN3Yy1fbbtt9jeEXiCshB4TWqEi5oPMSIi\nmtHUmrGS1rE9T9J4qoZ9u7KpXmqE6AG5+Rgx+jTV0AOXSVobeJEq1cGfSvnLUiNERER3NNXQ295h\ngPIpzRw3IiJaJzNjIyJ6XBr6iIge12wf/YAkHU01c1bAt22f2a5zRecsSzNjc+M5ekVbruj7pUfY\nAthL0sbtOFdERAyuXV03g6ZHiIiIzmlXQz9UeoSkQIiI6JC2NPQNpEdICoSIiA5p26ibgdIjRERE\nZ7Vz1M1A6RFiFMtIlIjRp20NPQOnR4iIiA5qW0M/UHqEiIjorMyMjYjocWnoIyJ6XENdN5LOAfYC\n5tnerJR9HtgHeAmYB0yx/UjNPlsBNwP727601YFHd/RqCoTcZI5e1ugV/bnAnv3Kvmh7c9sTgauA\nE/s2SFqeahz99a0IMiIiRq6hht72jVRj4WvLnqp5+UrANa8/ClxGdaUfERFd1OxSgqcCHwCeBHYp\nZRtQ5bXZFdhqkH0PAw4DGD9+fDNhRETEIJpdHPwE2+OA84GjSvGZwHG2Fw2xb1IgRER0QKvG0f8A\nuBo4CZgEXCgJYAwwWdJC2z9s0bkiImIYRtzQS9rYdl/+mr2BuwFsv7amzrnAVWnke0dGp0SMPo0O\nr7wA2BkYI2kO1ZX7ZEmbUg2vfAg4vF1BRkTEyDXU0Ns+sE7x2Q3sN2W4AUVERGtlZmxERI9LQx8R\n0eOG7LqRNA44D1iXqj/+W7anlm0fpRpWuRC42vanJL0P+OeaQ2wOvMX2zFYHH53XSykQcmM5lhWN\n9NEvBD5h+zZJqwMzJE0DXkOV62Zz289LWgfA9vlU4+qR9GbgR2nkIyK6Z8iG3vZcYG55/rSk2cAG\nwKHAv9t+vmyrl+7gQOCC1oUbERHDNaw+ekkTgC2BW4BNgB0k3SLpFyVbZX/7M0BDL+kwSdMlTZ8/\nf/7woo6IiIY13NBLWo0qUdkxJaHZCsCrgG2p+uQvVpkOW+pvAzxr+656x0sKhIiIzmiooZe0IlUj\nf77ty0vxHOByV26lulE7pma3A0i3TURE1zUy6kZUk6Nm2z6jZtMPqTJU3iBpE2Al4LGyz3LAfsCO\nLY84uiojVSJGn0ZG3WwPHATcKalv9MyngXOAcyTdBbwAHGy7Lyf9jsAc2/e3OuCIiBieRkbd3ARo\ngM3vH2CfG6j67iMiossyMzYioseloY+I6HEjToEgaSLwDWAVqtmzH7F9q6SdgR8BD5RDXG77lHYE\nH503mlMg5EZyLKuaSYFwOnCy7WslTS6vdy77/NL2Xm2JOCIihqWZFAgG1ijV1gQeaVeQERExcsNa\nSrBfCoRjgOslfYmqr//vaqpuJ+l2qsb/k7Zn1TnWYcBhAOPHjx9J7BER0YBmUiAcARxrexxwLItX\nnLoN2Mj2FsB/Uk2sepmkQIiI6AwtnuM0SKUqBcJVwPV9s2MlPQmsZdtl9uyTtteos++DwCTbjw10\n/EmTJnn69OkjfAsREcsmSTNsTxqq3pBX9IOkQHgE2Kk83xW4p9Rfty+5maStyzkeH174ERHRKs2k\nQDgUmCppBeA5Sn87sC9whKSFwF+AA9zI14aIiGiLZlMgvLVO/bOAs5qMKyIiWiQzYyMielwjffTj\nJP1c0mxJsyQdXcr3K69fkjSppv7fS5oh6c7y767tfAMRETG4ZmbG3gW8B/hmv/qPAe+0/YikzYDr\nqSZYRQ8YjSkQkvoglnUjnhlrexpAzeqBffV/U/NyFrCKpJX7FhGPiIjOamZx8Eb8I/CbNPIREd3T\ncAqEOjNjh6r/JuA04B8G2J4UCBERHdDM4uCD1d8QuAL4gO376tVJCoSIiM5oZnHwgeqvBVwN/Ivt\nXzUfYixNcmMzYvRp5Iq+b2bsrpJmlsdkSe+WNAfYDrha0vWl/lHA64HP1tRfpz3hR0TEUJqdGXtF\nnfpfAL7QZFwREdEimRkbEdHj0tBHRPS4php6SatIulXS7SUdwsml/OxSdoekS8vQzIiI6IJhLSVY\nx/PArrafKUMwb5J0LdXKU08BSDqD6gbtvzd5rhiGdqUqyKibiNGnqYa+5Jl/prxcsTxc08gLWJVq\nIfGIiOiCpvvoJS1fFiSZB0yzfUsp/y/gj8AbqNaOjYiILmi6obe9yPZEYENg65KxEtsfBNYHZgP7\n999P0mGSpkuaPn/+/GbDiIiIAbRs1I3tBcANwJ41ZYuAi6iSm/WvnxQIEREd0FQfvaSxwIu2F0ha\nFdgdOF3S623fW/ro3wnc3YJYYxhy0zQi+jQ76mY94LuSlqf6dnAxVZ6bX0pag2pG7e3AEU2eJyIi\nRqjZUTd3UOWn72/7Zo4bERGtk5mxERE9Lg19RESPa3ThkXMkzZN0V51tn5RkSWP6lW8laZGkfVsV\nbEREDF+jffTnAmcB59UWShoH/D3wh37ly1MtI3g90XHtSn8AGc0TMRo1dEVv+0bgiTqbvgJ8ipen\nOPgo1dKD85qKLiIimjbiPnpJewP/Z/v2fuUbAO8GvjHE/pkZGxHRASNq6CW9AjgBOLHO5jOB48qs\n2AFlZmxERGeMdBz93wCvBW6vJr+yIXCbpK2BScCFpXwMMFnSQts/bEG8ERExTCNq6G3fCfx1wW9J\nDwKTbD9G9Qegr/xc4Ko08p2VG6YRUavR4ZUXAL8GNpU0R9KH2htWRES0SkNX9LYPHGL7hAHKpww/\npIiIaKXMjI2I6HFp6CMielxTDb2koyXdJWmWpGNK2eck/Z+kmeUxuTWhRkTESIw4TXFZMvBQYGvg\nBeA6SX1z779i+0stiC+GqZ3pDyAjeiJGo2by0f8tcLPtZwEk/YJqRmxERCxFmum6uQvYUdLaZabs\nZGBc2XaUpDtK1stX1ds5KRAiIjpjxA297dlUGSqnAddRLRm4EPg61czZicBc4MsD7J8UCBERHdDU\nzVjbZ9t+i+0dqbJb3mP7UduLbL8EfJuqDz8iIrqkqTVjJa1je56k8cB7gO0krWd7bqnybqounuiQ\n3CyNiP6aauiByyStDbwIHGn7T5K+J2kiVY76B4EPN3mOiIhoQlMNve0d6pQd1MwxIyKitTIzNiKi\nx6Whj4jocc3MjF0FuBFYuRznUtsnSTqfavGRF4FbgQ/bfrEVwcbLtXsmbH+52Rsx+jRzRf88sKvt\nLajGzO8paVvgfOANwJuBVYFDmo4yIiJGbMRX9LYNPFNerlgetn1NXx1Jt1ItMxgREV3SbPbK5SXN\nBOYB02zfUrNtReAgqlmz9fZNCoSIiA5odmbsItsTqa7aty4ZLft8DbjR9i8H2DcpECIiOqAlo25s\nLwBuAPYEkHQSMBb4eCuOHxERI9fMqJuxwIu2F0haFdgdOE3SIcAewG4l3020UUbBRMRQmpkZux7w\nXUnLU30zuNj2VZIWAg8Bv5YEcLntU5oPNSIiRqKZUTd3AFvWKW82f05ERLRQZsZGRPS4NPQRET1u\nyG4WSeOA84B1gZeAb9meKukiYNNSbS1gge2JkiYAs4HflW032z681YEvqzqd8qC/3PyNGH0a6U9f\nCHzC9m2SVgdmSJpme/++CpK+DDxZs899ZXx9RER02ZANfVktam55/rSk2cAGwG8BVA2t+Sdg1zbG\nGRERIzSsPvrSLbMlcEtN8Q7Ao7bvqSl7raTfSPqFpJctTlKOlRQIEREd0HBDL2k14DLgGNtP1Ww6\nELig5vVcYLztLalmxv5A0hr9j5cUCBERndFQQ18SlF0GnG/78pryFagWBb+or8z287YfL89nAPcB\nm7Qy6IiIaFwjo24EnA3Mtn1Gv827A3fbnlNTfyzwhO1Fkl4HbAzc38KYl2kZ9RIRw9XIFf32VOmG\nd5U0szwml20HsGS3DcCOwB2SbgcuBQ63/UTLIo6IiGFpZNTNTYAG2DalTtllVN08ERGxFMjM2IiI\nHpeGPiKixzVyM3YV4EZg5VL/UtsnSTofmAS8CNwKfNj2i+Xm7VRgMvAsMMX2be16A8uSbqc/gNwM\njhiNGrmifx7Y1fYWwERgT0nbAucDbwDeDKwKHFLqv51qpM3GwGHA11sddERENG7Iht6VZ8rLFcvD\ntq8p20x1Rb9hqbMPcF7ZdDOwlqT12hF8REQMrdEJU8tLmgnMA6bZvqVm24pUwy+vK0UbAA/X7D6n\nlPU/ZlIgRER0QEMNve1FJRvlhsDWkjar2fw14Ebbvyyv6w3FdJ1jJgVCREQHDGvUje0FwA3AngCS\nTgLGUuW06TMHGFfzekPgkaaijIiIEWtk1M1Y4EXbCyStSpX24DRJhwB7ALvZfqlmlyuBoyRdCGwD\nPFlSHUeTMuIlIkaikYVH1gO+K2l5qm8AF9u+StJC4CHg19WISi63fQpwDdXQynuphld+sC2RR0RE\nQxpJgXAHVQ76/uV19y2jcI5sPrSIiGiFzIyNiOhxaegjInpcIzdjzwH2AubZ3qyUTQS+AaxCtXj4\nR2zfKul9wHFl12eAI2zf3pbIe9TSkOZgMLkhHDH6NHJFfy5lOGWN04GTy9j6E8trgAeAnWxvDnwe\n+FaL4oyIiBFq5GbsjWVR8CWKgb51YNekjJO3/T81dW5mcVqEiIjokkaGV9ZzDHC9pC9RfSv4uzp1\nPgRcO9ABJB1GlfSM8ePHjzCMiIgYykhvxh4BHGt7HHAs1ZqyfyVpF6qG/rg6+wJJgRAR0SkjvaI/\nGDi6PL8E+E7fBkmbl9dvt/14c+Ete3KzMyJabaRX9I8AO5XnuwL3AEgaD1wOHGT7982HFxERzWpk\neOUFwM7AGElzgJOAQ4GpklYAnqP0tVONwFkb+FpJi7DQ9qQ2xB0REQ1qZNTNgQNsemuduoeweKWp\niIhYCmRmbEREj2uqoZf0oKQ7Jc2UNL2U7SdplqSXJKXbJiKiy0Y66qbWLrYfq3l9F/Ae4JstOPYy\nZWlPfwAZFRQxGrWioV+C7dkA5WZsRER0WbN99AZ+ImlGmekaERFLmWav6Le3/YikdYBpku62fWMj\nOyYFQkREZzR1RW+7L5nZPOAKYOth7JsUCBERHTDiK3pJrwSWs/10ef4PwCkti2wZlBudEdEOzVzR\nvwa4SdLtwK3A1bavk/TuMoN2O+BqSde3ItCIiBiZEV/R274f2KJO+RVU3TgREbEUyMzYiIgel4Y+\nIqLHDdnQSxon6eeSZpfUBkeX8otK6oOZJRXCzH77jZf0jKRPtiv4iIgYWiN99AuBT9i+TdLqwAxJ\n02zv31dB0peBJ/vt9xUGWUpwWTQaUhwMJSODIkafRtIUzwXmludPS5oNbAD8FkBVroN/olqAhFL2\nLuB+4M9tiDkiIoZhWH30kiYAWwK31BTvADxqu2+VqVdSrRV78hDHOkzSdEnT58+fP5wwIiJiGBpu\n6CWtBlwGHGP7qZpNBwIX1Lw+GfiK7WcGO15mxkZEdEZD4+glrUjVyJ9v+/Ka8hWoUhLXrja1DbCv\npNOBtYCXJD1n+6zWhR0REY1qZM1YAWcDs22f0W/z7sDdtuf0FdjeoWbfzwHPpJGv5EZmRHRDI103\n2wMHAbvWDKecXLYdwJLdNhERsZRpZNTNTUDdVURsTxli38+NKKqIiGiZzIyNiOhxaegjInpcMykQ\nPi/pjtJn/xNJ6/fbbytJiyTt267gIyJiaCNOgQB80fZnASR9DDgROLy8Xh44DehILvpeSC0wWmTk\nUMToM+QVve25tm8rz58GZgMb9Js09UqqhcL7fJRq3P28FsYaEREjMKyFR/qnQJB0KvABqoRmu5Sy\nDYB3U+W+2WqQY2Vx8IiIDmgqBYLtE2yPA84HjipVzwSOs71osOMlBUJERGc0lQKhxg+Aq4GTgEnA\nhdWEWsYAkyUttP3D1oQcERHDMeIUCJI27stYCewN3A1g+7U1dc4Frmp3I58bhBERA2vkir4vBcKd\nNatIfRr4kKRNgZeAhygjbiIiYunSTAqEaxrYd8oIYoqIiBaS7aFrtTsIaT7Vt4JWGQM81sLjtdto\ninc0xQqJt90Sb/s0EutGtocczbJUNPStJmm67UndjqNRoyne0RQrJN52S7zt08pYk+smIqLHpaGP\niOhxvdrQf6vbAQzTaIp3NMUKibfdEm/7tCzWnuyjj4iIxXr1ij4iIoo09BERPa4nGnpJr5Y0TdI9\n5d9X1akzUdKvy+Ipd0javxuxlliGjLfUu07SAklXdSHGPSX9TtK9ko6vs31lSReV7beUzKZd00C8\nO0q6TdLCpWExnAbi/bik35bf1Z9J2qgbcdbEM1S8h0u6syxEdJOkN3YjzhLLoLHW1NtXkiV1dbhl\nA5/tFEnzy2c7U9Ihwz6J7VH/AE4Hji/PjwdOq1NnE2Dj8nx9YC6w1tIab9m2G/BOqnxBnYxveeA+\n4HXASsDtwBv71fkI8I3y/AATlB0eAAADLklEQVTgoi7+/BuJdwKwOXAesG+3Yh1GvLsAryjPjxgF\nn+8aNc/3Bq5bWmMt9VYHbgRuBiYt5Z/tFOCsZs7TE1f0wD7Ad8vz7wLv6l/B9u9dkrDZfoRqUZRu\n5UceMl4A2z8Dnu5UUDW2Bu61fb/tF4ALqWKuVfseLgV2KwnwumHIeG0/aPsOqtxM3dZIvD+3/Wx5\neTOwYYdjrNVIvIMtRNRJjfzuAnye6oLruU4GV0ej8TalVxr619ieC9WKWMA6g1WWtDXVX8/7OhBb\nPcOKtws2AB6ueT2nlNWtY3sh1eIza3ckupdrJN6lyXDj/RBwbVsjGlxD8Uo6UtJ9VA3oxzoUW39D\nxippS2Cc7Y53idbR6O/CP5ZuvEsljRvuSYa1wlQ3SfopsG6dTScM8zjrAd8DDrbdtqu7VsXbJfWu\nzPtfoTVSp1OWplga0XC8kt5PtcbDTm2NaHANxWv7q8BXJb0X+AxwcLsDq2PQWCUtB3yFqjtkadDI\nZ/tj4ALbz0s6nOqb9K7DOcmoaeht7z7QNkmPSlrP9tzSkNddq1bSGlQLpHzG9s1tChVoTbxdNAeo\nvWrYEHhkgDpzJK0ArAk80ZnwXqaReJcmDcUraXeqC4OdbD/fodjqGe7neyHw9bZGNLChYl0d2Ay4\nofQ0rgtcKWlv29M7FuViQ362th+veflt4LThnqRXum6uZPHVw8HAj/pXkLQScAVwnu1LOhhbPUPG\n22X/C2ws6bXlczuAKuZate9hX+C/Xe4cdUEj8S5Nhoy3dC98E9jbdrcvBBqJd+Oal+8A7qE7Bo3V\n9pO2x9ieYHsC1f2PbjXy0Nhnu17Ny72B2cM+S7fuNrf4zvXawM+ofrl+Bry6lE8CvlOevx94EZhZ\n85i4tMZbXv8SmA/8heov/x4djHEy8Huq+xgnlLJTqP5TAKwCXALcC9wKvK7LvwNDxbtV+Qz/DDwO\nzFrK4/0p8GjN7+qVS3m8U4FZJdafA29aWmPtV/cGujjqpsHP9t/KZ3t7+WzfMNxzJAVCRESP65Wu\nm4iIGEAa+oiIHpeGPiKix6Whj4jocWnoIyJ6XBr6iIgel4Y+IqLH/X98Q4C0apigsAAAAABJRU5E\nrkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x107a8b710>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 3* Lasso\n",
    "lasso = LassoCV(alphas = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000], \n",
    "                max_iter = 5000)\n",
    "lasso.fit(X_train, y_train)\n",
    "alpha = lasso.alpha_\n",
    "print(\"Best alpha :\", alpha)\n",
    "\n",
    "print(\"Try again for more precision with alphas centered around \" + str(alpha))\n",
    "lasso = LassoCV(alphas = [alpha * .6, alpha * .65, alpha * .7, alpha * .75, alpha * .8, \n",
    "                          alpha * .85, alpha * .9, alpha * .95, alpha, alpha * 1.05, \n",
    "                          alpha * 1.1, alpha * 1.15, alpha * 1.25, alpha * 1.3, alpha * 1.35, \n",
    "                          alpha * 1.4], \n",
    "                max_iter = 50000, cv = 10)\n",
    "lasso.fit(X_train, y_train)\n",
    "alpha = lasso.alpha_\n",
    "print(\"Best alpha :\", alpha)\n",
    "\n",
    "mse_cv = np.mean(lasso.mse_path_, axis = 0)\n",
    "rmse_cv = np.sqrt(mse_cv)\n",
    "print(\"cv of rmse :\", min(rmse_cv))\n",
    "\n",
    "y_train_lasso = lasso.predict(X_train)\n",
    "rmse_train = np.sqrt(mean_squared_error(y_train,y_train_lasso))\n",
    "print(\"RMSE on Training set :\", rmse_train)\n",
    "\n",
    "# Plot important coefficients\n",
    "coefs = pd.Series(lasso.coef_, index = X_train.columns)\n",
    "print(\"Lasso picked \" + str(sum(coefs != 0)) + \" features and eliminated the other \" +  \\\n",
    "      str(sum(coefs == 0)) + \" features\")\n",
    "imp_coefs = pd.concat([coefs.sort_values().head(10),\n",
    "                     coefs.sort_values().tail(10)])\n",
    "imp_coefs.plot(kind = \"barh\")\n",
    "plt.title(\"Coefficients in the Lasso Model\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 对测试集进行测试，生成提交文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "y_test_pred = lasso.predict(X_test)\n",
    "y_test_pred = y_test_pred * std_y +  mean_y\n",
    "\n",
    "#生成提交测试结果\n",
    "\n",
    "#df = pd.DataFrame({\"Id\":test_Id, 'SalePrice':y_test_pred})\n",
    "#df.reindex(columns=['Id'])\n",
    "y = pd.Series(data = y_test_pred, name = 'SalePrice')\n",
    "df = pd.concat([test_Id, y], axis = 1)\n",
    "df.to_csv('submission.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 1459 entries, 0 to 1458\n",
      "Data columns (total 2 columns):\n",
      "Id           1459 non-null int64\n",
      "SalePrice    1459 non-null float64\n",
      "dtypes: float64(1), int64(1)\n",
      "memory usage: 22.9 KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "_change_revision": 0,
  "_is_fork": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
